-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix freq setter for DatetimeIndex/TimedeltaIndex and deprecate for PeriodIndex #20772
Conversation
msg = ('Setting PeriodIndex.freq has been deprecated and will be ' | ||
'removed in a future version; use PeriodIndex.asfreq instead. ' | ||
'The PeriodIndex.freq setter is not guaranteed to work.') | ||
warnings.warn(msg, FutureWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we really need to deprecate setting this: it's user facing, but I think the only case it works is if you set to the existing frequency. Could potentially remove the setter entirely, causing an AttributeError
to be raised, or could raise a custom AttributeError
within the setter to give a better message directing users to .asfreq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IMO it was buggy, and as you say, only if it was the same frequency, it was working. But given that corner case, it maybe does not hurt to do the deprecation I would say.
I think the custom AttributeError message is a good idea in the future.
Codecov Report
@@ Coverage Diff @@
## master #20772 +/- ##
==========================================
+ Coverage 91.77% 91.78% +<.01%
==========================================
Files 153 153
Lines 49313 49322 +9
==========================================
+ Hits 45259 45268 +9
Misses 4054 4054
Continue to review full report at Codecov.
|
I would also deprecate settig for DTI/TDI, and direct to using |
Can implement cc @jorisvandenbossche : Thoughts on this? Your preference in the issue was to allow setting for DTI/TDI and not implement |
Yes, -1 on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the tz
question, this looks good IMO
return None | ||
|
||
on_freq = cls._generate( | ||
index[0], None, len(index), None, freq, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is generating the index needed to know if it a correct frequence? Checking the inferred frequency is not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, see it was also done like that below (still wondering though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's necessary, at least to a certain extent. There are cases where multiple frequencies are valid for a given set of dates, so you can have a valid frequency that's not the inferred frequency, e.g. ['2018-01-01', '2018-01-02', '2018-01-03']
could be calendar day or business day. Users can also define custom frequencies that would need to be validated but would not be the inferred frequency.
There are cases where you don't need to generate the entire index to reject invalid frequencies, e.g. if the second generated value doesn't match. But I don't immediately see how to get around generating the entire index to determine if a frequency is valid. I suppose you could maybe optimize memory usage with huge indexes by doing some type of elementwise validation, only keeping the current generated element in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I see, that's for sure a valid case.
For me, improving the performance of the validation is not very high priority.
inferred = subarr.inferred_freq | ||
if inferred != freq.freqstr: | ||
on_freq = cls._generate(subarr[0], None, len(subarr), None, | ||
freq, tz=tz, ambiguous=ambiguous) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing the tz
in the new version is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, and I don't think it was actually needed in the old version either. At the point at which this is being done, subarr
and it's elements should already be tz aware, and cls._generate
will infer tz
from subarr[0]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yep, indeed, that sounds correct (and tested it with a small example)
msg = ('Setting PeriodIndex.freq has been deprecated and will be ' | ||
'removed in a future version; use PeriodIndex.asfreq instead. ' | ||
'The PeriodIndex.freq setter is not guaranteed to work.') | ||
warnings.warn(msg, FutureWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IMO it was buggy, and as you say, only if it was the same frequency, it was working. But given that corner case, it maybe does not hurt to do the deprecation I would say.
I think the custom AttributeError message is a good idea in the future.
I think this difference is acceptable, because |
(note: I tagged for 0.23, but it does not need to hold up the rc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code change looks fine. I still think should deprecate setting the freq on DTI/TDI. I don't see much value there and the inconcistenty with that you cannot set PI is pretty jaring. Is there an actual good usecase for setting DTI/TDI? If there is we should actually do this via a function rather than an attribute to make it more explicit.
If this is going in 0.23, then I'd like it to be in the RC. Ideally we only merge bugfixes for the RC to the 0.23 release branch. |
I more or less have code ready that addresses @jreback's comments, but am a bit hesitant to push since there isn't agreement on if the freq setter should be deprecated, and if so what the name of the alternative method would be. |
This commit would go on top of the current changes in this PR to address @jreback's comments, in case anyone cares too look while things are still being discussed here: jschendel@2fca775 The CI services that are hooked to my repo all ran successfully, and I didn't see any warnings related to my changes in the logs. Didn't want to mess around with this PR too much while there's still ongoing discussion. |
So I have a clear -1 on re-using the name If we want to keep the possibility of setting the freq, I don't mind it being the direct setting of the freq attribute (I don't think it is worth to introduce another method for this)
I don't mind allowing setting on a PI. But if we follow the same rules as for setting |
Can you repeat I can't find what you said. I am fine the the PI change here. The issue comes down to whether we should allow one to change |
I have said it here #20772 (comment) and here #20678 (comment). But it actually just boils down to "for PI
There is nothing "plain wrong" about setting an attribute, it is perfectly valid and depending on the use case often even preferable python code (I think often having a lot of "set_.." methods is seen as "old-java-style" while just having a property with getter and setter under the hood is more pythonic). |
On an immutable object this is True. We have 1 and only 1 attribute that we allow to be changed on an Index and that is the name. All others give you back a new object. Why break this paradigm? Further I don't buy your argument. Whether Is there a compelling use-case to actually set freq (whether inplace or return a new object) on a DTI/TDI? |
I made this PR after noticing a few SO questions that worked around the buggy setting behavior. That doesn't necessarily there's actually a valid use case for it, as there may be more direct workarounds for the issues people are having, but at the very least people are trying to use it. I think one of the issues is that people generally default to using I could also imagine a use case where one has messy data which needs to be cleaned first, prior to actually setting the
To add another data point, Not particularly invested in being for/against reusing the |
@jschendel thanks for the color. But why are people actually setting the |
Setting the In [2]: dti = pd.DatetimeIndex(['20180101', '20180102', '20180103'])
In [3]: dti
Out[3]: DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq=None)
In [4]: dti.shift(1)
---------------------------------------------------------------------------
NullFrequencyError: Cannot shift with no freq
In [5]: dti + 4
---------------------------------------------------------------------------
NullFrequencyError: Cannot shift with no freq
In [6]: dti = pd.DatetimeIndex(['20180101', '20180102', '20180103'], freq='B')
In [7]: dti.shift(1)
Out[7]: DatetimeIndex(['2018-01-02', '2018-01-03', '2018-01-04'], dtype='datetime64[ns]', freq='B')
In [8]: dti + 4
Out[8]: DatetimeIndex(['2018-01-05', '2018-01-08', '2018-01-09'], dtype='datetime64[ns]', freq='B') I've seen people use operations like this to enable some shorthand for inserting the previous/next logical value into an index, e.g. |
right, we allow an integer to stand in an represent the freq. ok I would say these are ok usecases (if prob not very well documented). but that is not sufficient reason to actually allow setting of an attribute on an immutable object (rather than just having a function give you a new one). its just bad semantics. I would be ok with
|
Exactly because the
For me that is another reason not to use |
ok, so I would be on board with deprecating This would be much more in line with our handling of names in any event. The PI changes are fine. Agree that |
@jreback Are you fine with merging this now as is? The changes it does are already good (fixing a clear bug in the current setting functionality + deprecation for PI), and the other changes you propose in your last comment can be done separately I think (on top of this PR). (just as way to include this in the rc) |
yes ok to merge now (and let’s open an issue though) |
Opened #20886 for the rest. |
Thanks all. |
git diff upstream/master -u -- "*.py" | flake8 --diff
For
DatetimeIndex
/TimedeltaIndex
:freq
with a string alias now worksFor
PeriodIndex
:freq
is deprecated in favor ofasfreq
asfreq
paramsfreq
setter behavior as-isPeriodIndex
anyways